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Abstract — Overlay network topology together with peer/data organization and search algorithm are the crucial components of 
unstructured peer-to-peer (P2P) networks as they directly affect the efficiency of search on such networks. Scale-free (power- 
law) overlay network topologies are among structures that offer high performance for these networks. A key problem for these 
topologies is the existence of hubs, nodes with high connectivity. Yet, the peers in a typical unstructured P2P network may not 
be willing or able to cope with such high connectivity and its associated load. Therefore, some hard cutoffs are often imposed 
on the number of edges that each peer can have, restricting feasible overlays to limited or truncated scale-free networks. In this 
paper, we analyze the growth of such limited scale-free networks and propose two different algorithms for constructing perfect 
scale-free overlay network topologies at each instance of such growth. Our algorithms allow the user to define the desired scale- 
free exponent (7). They also induce low communication overhead when network grows from one size to another. Using extensive 
simulations, we demonstrate that these algorithms indeed generate perfect scale free networks (at each step of network growth) 
that provide better search efficiency in various search algorithms than the networks generated by the existing solutions. 

Index Terms — Peer-to-peer networks, hard cut-off, scale-free, overlay networks, search efficiency. 
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One of the significant properties of decentralized 
P2P networks is the topological characteristics of the 
formed overlay network topology. In addition to the 
distribution of data to peers and the type of search 
algorithm used, performance of search queries issued 
by peers is profoundly affected by overlay topology 
(i.e. logical connectivity graph). It has been shown U 
that the performance is among the highest when the 
overlay topology is scale-free or has power-law degree 
distribution. This is because the network diameter of 
such topologies is small as it scales logarithmically 
(from 0(lnjV) to O(lnlniV)) with network size lTl9l . 

Even though small-world or scale-free topologies 
offer efficient search, preserving these properties in 
growing distributed P2P environments is challenging. 
There have been many efforts to build such scale-free 
overlay structures for P2P networks in a centralized 
manner. However such centralized solutions are not 
scalable due to the difficulty of obtaining and main- 
taining global knowledge in a central node. Therefore, 
recently some algorithms using only the locally avail- 
able information (i.e. neighbor peer's information) 
have been proposed. However, as expected, this has 
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caused the loss of scale-freeness in the generated 
overlay structures, resulting in degradation of search 
efficiency. 

A key difficulty in implementing scale-free net- 
works is the existence of hubs. Nodes may not be 
willing or able to host such hubs because of exces- 
sive bandwidth and processing requirements result- 
ing from high connectivity. Therefore, to promote fair- 
ness and topology acceptability [2J, some hard cutoffs 
are often imposed on the degree of each peer, making 
the topology a limited scale-free network. Clearly, these 
hard cutoffs might limit the scale-freeness of the entire 
topology. When the hard cutoff limit is lowered, the 
diameter of the network increases, reducing the search 
efficiency. 

The construction of scale-free topologies with hard 
cutoffs and the effects of hard cutoffs on the search 
efficiency was first studied by Guclu et al. in [2J. How- 
ever, the algorithm HAPA proposed by them has some 
deficiencies. It is only a limited version of the well- 
known Barabasi- Albert (BA) [3| or preferential attach- 
ment algorithm and with fully localized information, 
it does not produce a perfect scale-free distribution 
of node degrees. Moreover, its time to converge and 
messaging overhead grow as the number of peers 
in the network increase. Even though it does not 
require global topology information at the time when 
nodes join, it needs total node count in the network 
available at each node, incurring some maintenance 
cost. Furthermore, it does not allow the user to set 
the desired scale-free exponent (7), which significantly 
affects performance of search algorithms. 

In this paper, we address the challenges of growing 
scale-free overlay topologies with hard cutoffs. We 
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first analyze the growth of limited scale-free networks. 
Then, based on this analysis, we propose two different 
algorithms that construct scale-free overlay network 
topologies having the following properties: 

• High adherence to scale-freeness: Since the perfor- 
mance of the applications depends on the char- 
acteristics (i.e. diameter) of the overlay network 
on which they are built, the closer these networks 
adhere to the scale-free property, the more benefit 
the applications get from scale-free features. 

> Parametrized: Desired post-construction parame- 
ters (i.e. 7 that defines the search performance) 
of the scale-free network can be defined by the 
user. 

> Practical: Proposed models work on the growth 
of limited scale-free networks even if there is a 
limit on the number edges that a node can have 
in practice. 

> Cost-Efficient: An effective communication be- 
tween newly joining nodes and existing ones 
during the construction of overlay topology with 
low communication overhead decreases traffic 
intensity and increases construction efficiency. 

The rest of the paper is organized as follows. In 
Section [2j we overview the related work. In Section [3j 
we present our analysis on the growth of limited 
scale-free networks, and in Section|4]we continue with 
the details of the proposed growth models. Section [5] 
presents the simulation results. Finally, we conclude 
and outline future work in Section [6] 

2 Background 

2.1 Scale-free networks 

Since the discovery of the scale-free property, scale- 
free networks have attracted a great deal of research 
interest in many natural and artificial systems such 
as the Internet (6) and scientific collaboration net- 
works [7J. In these networks, nodes are connected 
according to the power-law of node degree distribu- 
tion. That is, the degree distribution of nodes does not 
depend on the number of nodes in the network. The 
probability that a node has degree i is proportional 
to P(i) ss , where the exponent is often limited to 
the range 2 < 7 < 3. However, in limited scale-free 
networks, only nodes with degrees smaller than the 
achievable maximum degree comply with this rule. 
We will elaborate on this later. 

The growth models for scale-free networks have 
been extensively studied in the last decade in net- 
work science. The notion of 'preferential attachment' 
is usually the core of these proposed models. It is 
equivalent to Yule process [9], which is used to model 
the distribution of sizes of biological taxa. Price IfTOl 
first applied this idea to growth of networks un- 
der a mechanism called 'cumulative advantage'. The 
concept 'preferential attachment' and its popularity 
as scale-free network models is after Barabasi and 



Albert's work [3] which independently rediscovered 
the same growth model on the web. 

In the Barabasi-Albert (BA) model [3], each joining 
node selects and connects to an existing node j with a 
probability (p{j) = {dj/Y^i=\di)) that is proportional 
to the existing node's current degree, dj . Each joining 
node computes p(j) for each existing node in the 
network and randomly selects k of them to connect 
to. The network formed by the BA model produces 
a power-law degree distribution with 7 = 3, thus 
P(i) « i~ 3 . There are also other models that com- 
pute p(j) differently than BA model does. However, 
they all use 'preferential attachment' rule. A complete 
review of such models is given in ITU . 

In practical applications of scale-free networks, 
there is often a hard cutoff on the degree of nodes. 
Therefore, in this paper we focus on the limited scale- 
free networks and study the growth models on such 
network^- This is different than most of the previous 
works which study the growth of scale-free networks 
with no hard cutoff limit. 

Moreover, in previously proposed growth models, 
there is only one way of computing connection prob- 
ability for each node, resulting in one, predefined 7 
value. In this paper, we propose growth models which 
define the connection probability of new joining nodes 
to existing nodes according to the network parameters 
(e.g. 7) desired in the final network. 

In the construction of a network topology, it is 
also important to do the construction efficiently. Even 
though the growth of scale-free topologies has been 
extensively studied, less focus has been given to the 
applicability and construction overhead of growth 
models. In a real network application (such as peer-to- 
peer networks), the growth of such scale-free overlay 
topologies may cause high communication overhead 
between nodes. Whenever a new node joins the net- 
work, it needs the current degree information of all 
nodes (global topology information) to compute p(j) 
for each existing node j to select nodes to which it 
will connect. Different than this working principle, 
computing with time [17] proposes a mechanism in 
which nodes self-select themselves according to their 
fitness to the task at hand. Each node computes 
maximum time delay, td, proportional to the inverse 
of its fitness. It responds to the request in a time 
uniformly randomly selected from interval (0, td). The 
benefit of this scheme is that the communication 
needed for selection is constant in the number of 
candidates [17]. In [8J, Bent et al. used this mechanism 
to select connections in a scale-free graph using the 
degree of nodes as fitness. When a new node joins the 
network, it sends a network-wide broadcast message 
(using flooding) to announce its presence. Once the 
new node starts receiving the responses from existing 

1. Yet, by setting a cutoff threshold equal to the number of nodes 
in the network, our algorithms are able to grow scale-free networks 
without cutoff. 
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nodes, it connects to the first k responders (since each 
new node connects only to k of the existing nodes 
at its joining time). To reduce communication, each 
existing node that has already sent or forwarded k 
different responses (of other nodes or its own) to the 
newly joining node can stop forwarding any other 
responses, since they will not have any chance to be 
selected for connection. 

2.2 Scale-free P2P Overlay Networks 

One of the first algorithms offering scale-free overlay 
network topology for unstructured peer-to-peer net- 
works is the LLR algorithm [13]. It is a variant of 
the BA model where only the nodes in the vicinity 
of a new node try to connect to it. Even though 
this algorithm helps in decreasing the construction 
overhead and relaxes the necessity of whole topology 
information, it causes divergence from scale-free net- 
work topology. Other similar approaches include 13, 
where a self organizing scale free topology is pro- 
posed, and Qll, where robustness and fragility of 
such networks have been studied. 

The above studies either require the availability 
of global network information or cannot construct 
limited scale-free overlay networks. To the best of our 
knowledge, the study [2J by Guclu et al. is the first 
work that studies the construction of limited scale- 
free overlay topologies for unstructured peer-to-peer 
networks as well as the the impact of hard cutoff on 
the efficiency of search algorithms. The authors pro- 
pose algorithms for building limited scale-free over- 
lay structures for peer-to-peer networks considering 
the locality in the preferential edge assignment. For 
example, in the Hop-and-Attempt preferential attach- 
ment (HAPA) algorithm, each new node joining the 
network first selects a random node and then attempts 
to connect to it. If it can not achieve connection (due 
to hard cutoff and preferential selection probability) 
or it needs more nodes to connect (to fill all its 
k stubs), it selects a random neighbor of currently 
selected but ineligible node and again attempts to 
connect to it. This continues until the new node fills 
all its stubs. Even though this algorithm works locally, 
it still assumes that the nodes know the total node 
count (n) in the network. The new node selects a 
random number between and 1 and connects to 
a visited node j if that random number is less than 
p(j) = dj/ Y^i=i df where the denominator is indeed 
equal to 2nk. Moreover, since p(j)s get smaller as n 
increases, as we will show in the simulation section, 
this algorithm may cause a new node to send a 
connection attempt message to many nodes in the 
network before succeeding to fill all its stubs. As a 
result, it may sometimes incur cost higher than the 
cost of a network-wide broadcast message. In a similar 
work [4[, Guclu et al. also studied the impact of ad- 
hocness (mobility of peers) on the network topology 
and search efficiency. 



3 Analysis 

In this section, we analyze the growth of limited scale- 
free networks with n nodes, each with the minimum 
degree k, the average degree of all nodes 2k, and the 
maximum degree (hard cutoff) m, with the degrees 
distributed according to the power law with exponent 
7. By definition, the nodes with maximum degree 
form a separate group from other nodes in terms of 
degree distribution: 

P(i) = ci -7 for nodes with degree k < i < m 

m—l 

P(i) = 1 - V P(i) for i = m 

i—k 

where c is a constant. Note that, in this definition, 
only the nodes that have not yet reached degree m 
are guaranteed to comply with the power-law degree 
distribution. However, as we show later in this sec- 
tion, whenever m > 3k, there exists a unique value 
of 7 with which all node degrees have frequencies in 
agreement with the power law. 

Our goal is to construct a topology that shows 
perfect adherence to scale-free property. Moreover, 
we want to achieve this without using any global 
information. We characterize such a graph by its 
parameters: n, m, k and 7, defined above. It is easy 
to show that the following inequalities must hold: 
m > 2k (we excluded here the trivial case of m = 2k in 
which all nodes of the graph are of degree m, trivially 
satisfying the definition of power law distribution of 
node degrees), 7 > 0, and n > 2k. We are interested 
in generated graphs with the number of nodes in the 
range 2k < n < n max and we assume that n max >> k. 

In this paper, we assume a constant integer k for 
the number of edges added by each joining node. 
However, it is a matter of simple extension to have 
instead a vector [ki] of expected frequencies with 
which i edges are added with the newly added node, 
such that k = YliLi^i- 

The three constants, k, m, 7 are independent of each 
other except that for certain values of m, and k, there 
is a lower bound for 7's. 

Let rii denote the number of nodes with degree i 
in the network with n nodes. By enumeration of all 
nodes and edges: 

m rn 

n = rii and 2kn = irii (1) 

i—k i—k 

Substituting n in the above equations, and taking 
n m out, we get: 

^ m — l 

n m = -7- y~] (2k - i)m (2) 

m — 2k f 

i—k 

The power law degree distribution yields: 

cn 

rii = — for 1 < m (3) 
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Using Eq. [3] to substitute in Eq. [2J we get: 



a I 



2k 



E 

i—k 



2k - i 



Using enumeration of nodes (n m + 



n i 



(4) 



n) 



with different node degrees, we can compute the 
constant c as: 

m — 2k ._. 

c = (5) 

Em-l m-i v ' 

i=k i~l 

Note that in limited scale-free networks we can- 
not enforce the power-law distribution for the nodes 
with maximum degree m because their frequency is 
defined by Eq. [4] However, for a given to and k, 
nodes with maximum degree will also have frequency 
defined by the power-law (n m = cn/rn 1 ) if 7 satisfies: 



2k 



m 1 



V — 

i—k 



(6) 



Since m > 2k, the left hand side of Eq. [6] is always 
positive, its derivative for 7 is — ln(m) (to — 2k) jm 1 
while its value approaches (1 — 2fc/m)m _7+1 when 7 
tends to infinity. The right hand side of this inequality 
can be initially negative, but for large 7 it must be 
positive. Its value approaches k 1+1 when 7 tends to 
infinity and it has the derivative — X^fc 1 l n (^) 2 \~ l ■ 
It is easy to show that the right hand side decreases 
slower than the left hand side and therefore at most 
one unique value of 7 can satisfy Eq. [6] The unique 
solution exists if and only if for 7=0, the right hand 
side is smaller than the left hand side, m~2k > 2k(m— 
k) — (to — l)m/2 + k(k — l)/2 which reduces to (to — 
2fc + l/2) 2 > fc 2 + fc-l/2 and since k 2 < fc 2 + fc-l/4 < 
(k + 1/2) 2 then we get to > 3fc. Thus, only for to 
greater or equal to 3fc, there exists a unique value of 7 
for which the constructed graph will have power-law 
distribution of all node degrees (including the nodes 
with maximum degree to). 

Now, we will work on the general case in which 
the frequency of nodes with maximum degree does 
not need to comply with the power-law distribution. 
To be independent of the graph size n, we will use 
frequency fi = n,i/n of nodes with degree i. Then, 
substituting c in rii definition with Eq. [5j we get: 



fi 



2k 



Em— 1 
■j=k 



for i < to 



and 



fm 1 ^ fi 



(7) 



(8) 



i—k 



Eq. [7] and Eq. [8] express frequencies, //s, as simple 
functions of to, k and 7. 

Let's consider now a growth of the graph from 
its size of n nodes to the size of n + 1 nodes. The 
added node has k edges originating from it which are 
then connected to the existing nodes, so on average 




Fig. 1 . Maximum cut off (to) values for different fc's in 
perfectly growing scale-free graph. 



it increases by 1 the number of nodes with degree k, 
i.e. n' k =n k + 1. 

Let a, denote the average number of nodes that 
increase their degree from i to i + 1 in one step of 
growth (so the number of nodes of degree i decreases 
by a, while the number of nodes with degree i + 1 
increases by a;) by connecting to a newly added node. 
Of course, each existing node can add at most one 
connection to a newly added node. Hence, we have: 



f k (n + 1) = fm + l-dk 



which yields: 



flfe = 1 - fk 



(9) 



(10) 



Similarly, /j = cij_i — dj for k < i < m — 1, so by 
induction: 



= 1 - fj 'for k < i < m - 1 (11) 

J= k 



Finally: 



^m — 1 fn 



All frequencies must be positive. For 
holdEL it is necessary and sufficient that n, 
Y^=k ~pr^ > 0/ which can be rewritten as: 



2k £ 



(12) 

that to 

,> or 



(13) 



i=k 



If this condition is not satisfied, it is always suffi- 
cient either to appropriately increase 7 or k or to suffi- 
ciently decrease m. Other changes to these parameters 
may or may not, depending on the particular values 
of the parameters, also cause the inequality of Eq. [13] 
to be satisfied. It is easy to notice that for 7 > 3 this 
inequality is satisfied for arbitrary to and k. Fig. Q] 
plots the maximum values of to for given 7 and k 
values. It confirms that the maximum value of to goes 
to infinity for 7 > 3. 



2. Extended details and proofs are presented in our technical 
report E]. 
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Algorithm 1 AddNode(algoType, k, m) 



n++ 



if algoType = SRA then 

numOfEdges = 

while (numOfEdges < k) do 

Pick a random number r in [0,1) 

d [numOfEdges] = { d| vj-i < r < in Eq.[i"4| 



numOfEdges++ 
end while 

Broadcast a message with d 
10: Add edge to the first k responders 
11: else if algoType = SDA then 
12: Broadcast a presence message 
13: All existing nodes receiving broadcast message 

increase total node count information by 1 
14: Nodes with degrees in SDA_degList[nxk] . . . 
SDA_degList[nxk+k-l] respond with their id 
and the total node count information 
15: Add edge to first responder from each degree 
16: Store the total node count information received 

from responders 
17: end if 



4 Proposed Growth Models 

We propose two different algorithms; (i) SRA and (ii) 
SDA. Algorithm [TJ which is run at each node as the 
new nodes join the network, shows the steps of each 
algorithm. The first algorithm (SRA) needs one-time 
precomputation of v[] array using the formulations 
in previous section and the second algorithm needs 
SDA_degList\\ array which is also computed one- 
time in Algorithm [2] before node joins start. 



4.1 Semi-Randomized Growth Algorithm (SRA) 

The rationale behind SRA (lines 2-10 in Algorithm [TJ 
is to define the probability ranges (i.e., expected 
frequencies) for the selection of degrees of existing 
nodes to whom the new joining nodes will connect 
such that the final degree distribution of all nodes in 
the network will fit to the desired scale-free degree 
distribution. The algorithm starts with an initial con- 
figuration of a fully connected graph of 2k + 1 nodes. 
When a new node joins the network, it randomly 
decides the degrees of nodes that it will connect by 
generating k random numbers, n . . . r/s, each in range 
[0,1] and finding the degree that each random number 
corresponds to. We define v/s (probability ranges) as: 



— ) V k < i < m — 1 



(14) 



where i>k-i = and v/s are computed at each node 
one time only at the beginning (before node joins) 
using the a, formulas in the previous section. The ran- 
dom number r, corresponds to the degree / such that 
vi-i <ri < vi is satisfied. Having computed all k node 



degrees to which it wants to connect, the new node 
then broadcasts a message with these degree values. 
Once the existing nodes in the network receive such a 
message, the nodes of the desired degrees respond to 
the new node to establish a connection to it. Then, the 
new node selects the first k of the nodes with desired 
degrees and connects to them. If the nodes with the 
desired degrees do not exist yet, which is likely only at 
the earlier stages of the network growth, until the first 
node reaches the degree to, the new node broadcasts 
a special request for the lower/higher degree nodefl 
after the period of response for the original broadcast 
passes. Note that once v array is known in advance, 
SRA will have the complexity of O(nmk) to grow a 
network of n nodes. 



4.2 Semi-Deterministic Growth Algorithm (SDA) 

Algorithm 2 SDA_ConnectionOrder(f[], k, m) 
1: for (i=k-l;i<m;i++) do 
2: freq[i] = f[i]/k 
3: n[i]=0 

4: score [i] = 2k x 2k x f[i] 
5: end for 
6: n[2k-l]=2k 

7: for (cur=k;cur<maxNodeCount;cur++) do 

8: for (c=0;c<k;c++) do 

9: best = maxNodeCount 
10: desired_degree=k-l 
11: for (i=k-l;i<m-l;i++) do 
12: al = n[i-l]-score[i-l]-freq[i-l] 

13: a2 = n[i]-score[i]-freq[i] 

14: bl = n[i-l]-l-score[i-l]-freq[i-l] 

15: b2 = n[i]+l-score[i]-freq[i] 

16: if i=k-l then 

17: al++ a2++ 

18: end if 

19: current = -|al|-|a2|+|bl|+|b2| 

20: if ((best > current) & (n[i]>0)) then 

21: best = current 

22: desired_degree = i 

23: end if 

24: end for 

25: SDA_degList[curxk+c] = desired_degree+l 

26: n[desired_degree] — 

27: n[desired_degree+l]++ 

28: for (i = k-1; i<m;i++) do 

29: score[i] = score[i]+freq[i] 

30: end for 

31: end for 

32: end for 

33: return SDA_degList 



3. Such a broadcast will be run only a limited number of times 
over initial small network, so its impact on communication over- 
head is negligible. Since initially nodes with degree 2k exist, the 
search will go in that direction. 
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In the second algorithm, we aim to adhere as 
closely as possible to the desired scale-free network 
degree distribution at each step of the construction 
(i.e., for each intermediate network created during the 
construction). To this extend, SDA_degList[] array is 
precomputed one time using Algorithm [2] and used in 
the run time of SDA (lines 11-16 in Algorithm In a 
network with n nodes, for each edge of new joining 
node, Algorithm [2] decides which degree (of nodes) 
accepting such edge will minimize the divergence of 
the resulting degree distribution from the scale-free 
distribution. Let n[i] denote current node count with 
degree f[i] denotes the expected frequency of de- 
gree i + 1 nodes (in perfect scale-free topology) as cal- 
culated in analysis section and freq[i] = f[i]/k is the 
expected increment in degree-(i + 1) node count with 
only one edge addition to a new node. score[i] denotes 
the current expected count of degree-(i + 1) nodes 
at given node count, and score [i] + freq[i] denotes 
the expected count of degree-(z + 1) nodes after one 
new edge assignment to the new node. The algorithm 
starts with a fully connected 2k + 1 node graph and 
updates score[i] values with each edge addition (line 
29 in Algorithm 0. With the score[i], freq[i] and n[i] 
values known for the current network, the algorithm 
computes the sum of absolute differences between the 
current (n[i\) and expected (score[i] + freq[i]) counts 
of nodes with each degree that adheres to power law 
distribution. Let f(i) = n[i] — score[i] — freq[i] show 
the difference of degree-(«+ 1) nodes' expected count 
and current count in the network. Then, the sum of 
absolute differences for all degrees (J2k<j<m fU)) m 
case of selecting degree i + 1 can be computed as: 

d(i) = -\f(i - l)| - |/(i)| + \f(i - 1) - 1| + |/(») + l| 

Once the algorithm finds the degree that will de- 
crease this sum the most (i.e., which provides the best 
adherence to scale-free distribution), it connects an 
edge from the new node to one of the nodes with 
such degree and updates the node counts (line 26-27 
Algorithm 

Note that the SDA algorithm deterministically finds 
the degree to which the new joining node should 
connect at each current total node count. However, 
the new node can select any node with such de- 
gree to connect (during run time), making the al- 
gorithm 'semi-deterministic'. Once each node runs 
Algorithm |2] one time and gets 'SDA_degList[]' as 
output, they can decide their action (i.e., to respond 
or not for connection) every time they receive a new 
joining node's broadcast message in run time. They 
check the degree values in SDA_degList[n x k] . . . 
SDA_degList[n x k + k — 1] using current node count 
n. If their degree is on the list, they respond to the 
new node with their ID and the total node count 



information in the networlQ such that the new node 
can also learn the current total node count in the 
network. The complexity of SDA is 0(nmk)+0(nk), 
where the former is the cost of getting 'SDA_degList[]' 
from Algorithm [2] and the latter is the complexity of 
growing a network of n nodes, making the overall 
SDA complexity O(nmk). Hence, the complexity of 
proposed algorithms matches the complexity of Gaian 
(O(nfc)) with constant m and it is lower than the 
complexity of HARA0 and BA (0(n 2 fc)). 

5 Simulation 

In this section, we compare the proposed algorithms 
with well-known previous algorithms in terms of (i) 
the goodness of scale-free distribution, (ii) the effect 
of using global information vs. not using it, (iii) the 
search efficiency in different search algorithms, and 
(iv) the messaging overhead and complexity incurred 
during the construction. To this end, we study three 
different search algorithms: flooding (FL), normalized 
flooding (NF), and random walk (RW). In FL, source 
(i.e., query originating) node initiates the search by 
sending a message to its first hop neighbors. If the 
neighbor nodes receiving this message do not possess 
the requested item, they forward this message to their 
own neighbors, excluding the node from which they 
received the message. This type of forwarding process 
is repeated by each node that receives the message 
and does not have the requested item. In NF [15), a 
node receiving the query message only forwards it to 
k (i.e., minimum degree of all nodes) of its neighbors 
in case it does not have the item in its repository. 
If a node has more than k neighbors, it randomly 
selects only k of them and forwards the message to 
them, excluding the one that sent the message to this 
node. Finally, in RW [16], a node receiving the query 
message and not possessing the requested item selects 
a single random neighbor and forwards the message 
to it. RW can also be considered as a special case of 
NF with a virtual minimum degree of 1. In all search 
algorithms, the forwarding of the message either stops 
after a predefined forwarding limit, which is called 
time-to-live (TTL), or the item is found at the current 
node. We also assume that search items are uniformly 
distributed among all nodes. Extensive details of these 
search algorithms can be found in (2). 

5.1 Simulation Results 

To compare the proposed growth model with existing 
algorithms, we generated different topologies (con- 
sisting of n nodes) using different k, rn and 7 values. 
We start with a fully connected network of 2k + 1 

4. In current setting, we assume fault-free communication be- 
tween nodes. 

5. The complexity of HAPA is not assessed in |2|, but the number 
of steps required in HAPA to find the next node to connect to grows 
with n, making the complexity higher than 0(nk). 
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Fig. 2. Degree distributions in different growth models (n = 5xl0 4 ). 



nodes and add a new node to the network following 
the connection mechanisms of each growth model. 
The new node selects k of the existing nodes (which 
have not reached its maximum edge limit) according 
to the algorithm in use and connects to them. 

The algorithms we compare in the simulations are 
listed in Table [T] As the table shows, BA algorithm 
needs the global topology information (degrees of all 
nodes). Even though the HAPA algorithm does not 
need degrees of each node, it still needs the global 
knowledge of total node or edge count in the network. 
On the other hand, Gaian and our first algorithm, 
SRA, do not use any global knowledge. Our second 
algorithm, SDA, just uses the total node count, as does 
HAPA. While all other algorithms can only generate 
a network of fixed degree distribution exponent (7) 
due to their designs, both of our algorithms can create 
topologies with desired exponent (so with desired 
network properties, such as diameter). 

In Fig. [2j we show the degree distribution in topolo- 
gies constructed by the compared algorithms. Since 
our algorithms can produce scale-free networks with 
a desired 7 exponent, we generated several network 
topologies with different 7 values. However, the other 



Algorithm 


Global knowledge used 


Flexible exponent 

(7) 


BA 13] 


Degrees of all nodes 


No 


HAPA |2| 


Total node count 


No 


Gaian |8| 


None 


No 


SRA 


None 


Yes 


SDA 


Total node count 


Yes 



TABLE 1 
Comparison of Growth Models 



algorithms can yield a network only with a single 7 
value. When we look at the degree distributions in 
topologies created by SDA in Fig. [2^- Et/ we clearly 
observe that the degree distributions perfectly match 
with the desired degree distribution of used 7 values 
in the construction. Similarly, there is a quite good 
match with the degree distributions of SRA algorithm 
and the predefined 7 value used in the construction. 
We only see a slight curve towards the end of the 
lines (high degree nodes) when m = 50 in Fig. [2^ 
and Fig. O. This is typically due to increasing impact 
of randomness used in the SRA algorithm and the 
insufficient number of nodes in the final network as 
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Fig. 3. SRA search results 



the value of 7 increases. These minor curves disappear 
if the number of nodes in the network increases or the 
7 value decreases (there is no such a curve in Fig. Eli, 
where 7=2.5). 

On the other hand, when we look at the degree 
distributions of the other algorithms, we can not see 
a good scale-free distribution even though some use 
global topology information during their construction 
phase. The degree distribution line is either concave 
or convex curve rather than a straight line. This result 
is indeed expected due to their designs which are 
originally proposed for general scale-free networks 
without hard cutoffs and later adjusted to limited 
scale-free networks. Thus, these figures clearly show 
that these algorithms are not able to create perfect 
limited scale-free topologies. 



Method 


SDA 
(7=3.0) 


SRA 
(7=3.0) 


BA 


HAPA 


Gaian 


7 (m=20) 


2.99187 


2.97974 


2.32319 


3.04869 


2.322818 


ChiSquare 
(m=20) 


0.00001 


0.00064 


0.00243 


0.00435 


0.00246 


7 (m=50) 


3.00057 


2.97269 


2.46555 


3.31755 


2.465575 


ChiSquare 
(m=50) 


0.00007 


0.00366 


0.00494 


0.01197 


0.00498 



TABLE 2 
Results of fitness analysis 



To quantify the fitting quality of distributions, we 
followed [18] and used the maximum likelihood es- 
timation (MLE) method to compute the best fitting 7 
for results obtained from each algorithm. Then, using 
ChiSquare statistics, we computed the probability that 
the fit of data to power-law distribution with this 7 
is random, so the smaller the value, the better the fit. 
Table |2] shows the results for k = 2 and 777 > 20 @. 
The results show that SDA has superior performance, 
with the maximum divergence from the desired 7 of 
at most 0.008 and the very low probability, 0.00007, 
of the match being random. The next in performance 
is SRA with a divergence that is three times larger 
from 7 (0.028) and about 50 times higher in probability 
of a random match (0.00366). The compared methods 

6. We used MLE for cases with m > 20 since it does not work 
well for small samples [18 1. 



have at least 8 (BA0, Gaian) to 16 (HAPA) times larger 
divergence, and probability of random fit from 70 
to 170 times higher than SDA. The comparisons of 
performance with k = 3, not shown for the sake of 
brevity, yielded similar results. 

Next, we look at the performance of proposed algo- 
rithms in terms of the search efficiency and compare 
their performance with other algorithms using differ- 
ent types of searches (in a network with n = 10, 000 
nodes). As Fig. [3^ shows, the number of hits with 
given TTL value increases in SRA algorithm as the 
used 7 value increases. Moreover, the improvement 
becomes more visible as the limit on the degree of 
nodes, m, increases. On the other hand, Fig.[3[>c show 
that as m and 7 decreases, the NF and RW search 
efficiency increases, unlike the FL search performance. 
The impacts of m and 7 on SDA algorithm is similar 
to the impacts of those parameters on SRA, thus, we 
did not present them for brevity. 

Fig. Hk-b show the comparison of FL search effi- 
ciency in all algorithms with m=10 and m=50, respec- 
tively. SRA algorithm with 7 = 3.5 achieves the best 
hit ratios with given TTL value in either case, while 
SDA with 7=3.5 (and HAPA algorithm when to = 50) 
achieves the second best. BA and Gaian algorithms 
have much lower performance. 

NF results in Fig. Hfc-d reveal some interesting 
trends. Since our algorithms show the best NF search 
efficiency as 7 decreases, we set 7 = 1.53 (its lowest 
possible) when m = 10 and 7 = 2.46 when 777 = 50 
(which also generates fully perfect scale-free topology 
where nodes with degree m also comply the scale- 
free distribution). SRA /SDA algorithms have similar 
performance in both cases, and their performance is 
up to 12% better than the performance of BA and 
Gaian algorithms when to = 10 and similar to them 
when to = 50. The reason why they cannot have 
better performance when to = 50 is due to the larger 
minimum possible 7 (2.46) with given (k = 2, m = 50) 
setting. Note that HAPA has the worst NF search 
efficiency among all algorithms. 

Fig- H^-f show the RW based search results. Since 
RW needs more TTL to reach the destination node 

7. Using the 7 = (3 — 2k/m) for limited BA derived from [2] 
gives even much (around 50 times) larger divergence. 
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Fig. 4. Search efficiency results: FL (a-b) NF (c-d) and RW (e-f) 



than NF, we applied the same normalization as used 
by previous work [ 2 ] to obtain the results of RW. That 
is, for a fair comparison we set the TTL value of RW 
search to the number of messages generated during 
NF search in the same setting. For example, the results 
in RW graphs with a TTL value of x show the number 
of hits achieved by RW search with the same number 
of messages used in NF search that uses the same x as 
the TTL. The comparison of algorithms in terms of RW 
based search efficiency leads to similar conclusions 
as the comparison of NF based search efficiency. We 
see that our algorithms perform better than all other 
algorithms when m = 10 and similar to BA and Gaian 
algorithms and better than HAPA when m = 50. 

We also compared the algorithms in terms of the 
communication overhead (e.g. number of messages) 
during the construction of a scale-free network in 
Fig. [5] In all algorithms except HAPA, when a node 
wants to join the network it sends a broadcast mes- 
sage to announce its presence. Then, in BA algorithm 
every node sends its current degree count back to 
the new joining node. In Gaian algorithm, each node 
sends (or forwards) at most k messages (containing 
degree of the corresponding node) towards the new 
joining node. This is also true in our algorithms, 
however, only nodes with desired degree respond, so 
the communication overhead is lower than it is in 
the Gaian algorithm. In HAPA algorithm, the new 
joining node first selects a random node and then 
attempts to connect to it. Next, it randomly walks in 
the network through neighbors until all its stubs are 
filled. Even though the HAPA algorithm is a localized 
algorithm, since each connection attempt by a new 
node becomes successful with probability p(j) and 
only if the visited node has an edge count lower 




1000 2000 3000 4000 5000 

Node count 



Fig. 5. Communication Overhead 

than the hard cutoff value, the new node's connection 
attempt message needs to travel a lot (sometimes 
a node is visited many times). Thus, it results in 
a large messaging overhead^. Fig. [5] clearly shows 
that the overhead of our algorithms is the smallest. 
Considering this result with the almost perfect degree 
distribution SRA achieves with a given exponent and 
without any global information, we can clearly state 
its superiority over other algorithms. The overhead 
of SDA is same to SRA and it achieves a much 
better fit to the scale-free property but it may not 
be robust during communication failures due to the 
requirement of total node count maintenance at every 
node. 

5.2 Summary of Contributions 

As the simulation results show, with the introduction 
of two new algorithms for P2P overlay networks, we 
demonstrated that: 

8. The overhead of HAPA algorithm in Fig. [5] does not include 
the overhead that will be generated for maintenance of total node 
count information at each node of the network. Its overhead will 
be higher if that would also be included. We could not include that 
cost since no detail is given about its implementation in @. 
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• Without using any global information (i.e., SRA), 
we can construct a sequence of growing scale-free 
overlay topologies with lower overhead than the 
currently used algorithms. Each of these evolv- 
ing topologies adheres nearly perfectly to the 
scale-free degree distribution. This is contrary to 
the statement in [2J indicating the necessity of 
some global information to achieve close scale- 
freeness. Moreover, the other algorithms can not 
show good scale-freeness, including those that 
use global information. 

• The effect of scale-free exponent, 7, is signifi- 
cant in achieving high search efficiency in dif- 
ferent search algorithms (especially for weakly 
connected networks (i.e. k=2)). While our algo- 
rithms with 7=3.5 achieves the best hit ratios 
in FL search, they achieve the best hit ratios in 
NF and RW searches when 7 is set to lowest 
possible value (7 = 1.53 (m = 10) and 7 = 2.46 
(m = 50)). Since our algorithm can create a 
scale-free network with a desired 7 value, the 
value that gives the best search efficiency for the 
given search algorithm can be used to create the 
scale-free overlay topology to increase the per- 
formance. The other algorithms can only achieve 
high search efficiency either in FL (HAPA) or 
NF/RW (BA and Gaian). 

• In limited scale-free networks, when m > 3k, 
there is a unique 7 with which all node degrees in 
the sequence (including the ones with degree m) 
of networks of growing size comply with power 
law distribution. 

6 Conclusion 

In this paper, we introduced two new algorithms 
for growing limited scale-free overlay topologies for 
unstructured P2P networks. In extensive simulations 
we demonstrated clear superiority of our algorithms 
with well known algorithms in literature. Our algo- 
rithms provide almost perfect adherence to the scale- 
free property using zero or limited global informa- 
tion and require less communication during overlay 
construction than others do. They also provide higher 
search efficiency for different search methods when 
a network is constructed with right parameters (i.e., 
7). In future work, we plan to develop algorithms 
which maintain the perfect scale-freeness without us- 
ing global information while nodes join and leave the 
graph at the same time. 
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